Language classification from bilingual word embedding graphs
نویسندگان
چکیده
We study the role of the second language in bilingual word embeddings in monolingual semantic evaluation tasks. We find strongly and weakly positive correlations between down-stream task performance and second language similarity to the target language. Additionally, we show how bilingual word embeddings can be employed for the task of semantic language classification and that joint semantic spaces vary in meaningful ways across second languages. Our results support the hypothesis that semantic language similarity is influenced by both structural similarity as well as geography/contact.
منابع مشابه
Cross-Lingual Sentiment Classification with Bilingual Document Representation Learning
Cross-lingual sentiment classification aims to adapt the sentiment resource in a resource-rich language to a resource-poor language. In this study, we propose a representation learning approach which simultaneously learns vector representations for the texts in both the source and the target languages. Different from previous research which only gets bilingual word embedding, our Bilingual Docu...
متن کاملlanguage development and lexical awareness of bilingual (Azeri -Persian) hard of hearing impaired children
The Relationship between Mean Length of utterance (MLU), Lexical Richness and syntactical and lexical metalinguistic Awareness in Bilingual (Turkish-Persian) normal and hearing impaired Children Objectives: Regarding the impact of hearing loss on language development and metalinguistic skill and being language development different from metalinguistic skill in bilingual children, studying of...
متن کاملOn the Role of Seed Lexicons in Learning Bilingual Word Embeddings
A shared bilingual word embedding space (SBWES) is an indispensable resource in a variety of cross-language NLP and IR tasks. A common approach to the SBWES induction is to learn a mapping function between monolingual semantic spaces, where the mapping critically relies on a seed word lexicon used in the learning process. In this work, we analyze the importance and properties of seed lexicons f...
متن کاملTowards producing bilingual lexica from monolingual corpora
Bilingual lexica are the basis for many cross-lingual natural language processing tasks. Recent works have shown success in learning bilingual dictionary by taking advantages of comparable corpora and a diverse set of signals derived from monolingual corpora. In the present work, we describe an approach to automatically learn bilingual lexica by training a supervised classifier using word embed...
متن کاملA Variational Autoencoding Approach for Inducing Cross-lingual Word Embeddings
Cross-language learning allows one to use training data from one language to build models for another language. Many traditional approaches require word-level alignment sentences from parallel corpora, in this paper we define a general bilingual training objective function requiring sentence level parallel corpus only. We propose a variational autoencoding approach for training bilingual word e...
متن کامل